Identifying Data Noises, User Biases, and System Errors in Geo-tagged Twitter Messages (Tweets)
نویسندگان
چکیده
Many social media researchers and data scientists collected geotagged tweets to conduct spatial analysis or identify spatiotemporal patterns of filtered messages for specific topics or events. This paper provides a systematic view to illustrate the characteristics (data noises, user biases, and system errors) of geo-tagged tweets from the Twitter Streaming API. First, we found that a small percentage (1%) of active Twitter users can create a large portion (16%) of geo-tagged tweets. Second, there is a significant amount (57.3%) of geo-tagged tweets located outside the Twitter Streaming API’s bounding box in San Diego. Third, we can detect spam, bot, cyborg tweets (data noises) by examining the “source” metadata field. The portion of data noises in geo-tagged tweets is significant (29.42% in San Diego, CA and 53.47% in Columbus, OH) in our case study. Finally, the majority of geo-tagged tweets are not created by the generic Twitter apps in Android or iPhone devices, but by other platforms, such as Instagram and Foursquare. We recommend a multi-step procedure to remove these noises for the future research projects utilizing geo-tagged tweets.
منابع مشابه
Scaling laws in geo-located Twitter data
We observe and report on a systematic relationship between population density and Twitter use. Number of tweets, number of users and population per unit area are related by power laws, with exponents greater than one, that are consistent with each other and across a range of spatial scales. This implies that population density can accurately predict Twitter activity. Furthermore this trend can ...
متن کاملDetecting Emergency Events and Geo-Location Awareness from Twitter Streams
the rapidly increasing number of messages on twitter is quite interesting. Through twitter streaming, this paper is capable of delivering tweets for any keywords from clients all around the world or Hashtag in real-time. However, semantic topic extraction and tracking the userinterested news events from messages on twitter can be considered as a challenging task. In this paper focused on detect...
متن کاملDeriving retail centre locations and catchments from geo-tagged Twitter data
Article history: Received 13 January 2016 Received in revised form 20 August 2016 Accepted 28 September 2016 Available online 20 October 2016 This investigation offers an initial foray into the application of geo-tagged Twitter data for generating insights within two areas of retail geography: establishing retail centre locations and defining catchment areas. Retail related Tweets were identifi...
متن کاملVisualizing User-Defined, Discriminative Geo-Temporal Twitter Activity
We present a system that visualizes geo-temporal Twitter activity. The distinguishing features our system offers include, (i) a large degree of user freedom in specifying the subset of data to visualize and (ii) a focus on discriminative patterns rather than high volume patterns. Tweets with precise GPS co-ordinates are assigned to geographical cells and grouped by (i) tweet language, (ii) twee...
متن کاملDiscover Patterns and Mobility of Twitter Users - A Study of Four US College Cities
Geo-tagged tweets provide useful implications for studies in human geography, urban science, location-based services, targeted advertising, and social network. This research aims to discover the patterns and mobility of Twitter users by analyzing the spatial and temporal dynamics in their tweets. Geo-tagged tweets are collected over a period of six months for four US Midwestern college cities: ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1712.02433 شماره
صفحات -
تاریخ انتشار 2017